AITopics | relu 0

Collaborating Authors

relu 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

http://papers.nips.cc/paper_files/paper/2021/file/043ab21fc5a1607b381ac3896176dac6-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 11:09:12 GMT

In theory, the choice of ReLU0(0) in [0,1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU0(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU0(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU0(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU0(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

artificial intelligence, machine learning, relu, (18 more...)

Neural Information Processing Systems

Country: Europe > France (0.16)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

Wang, Fang, Ceravolo, Paolo, Damiani, Ernesto

arXiv.org Artificial IntelligenceAug-6-2025

We propose HGCN(O), a self-tuning toolkit using Graph Convolutional Network (GCN) models for event sequence prediction. Featuring four GCN architectures (O-GCN, T-GCN, TP-GCN, TE-GCN) across the GCNConv and GraphConv layers, our toolkit integrates multiple graph representations of event sequences with different choices of node- and graph-level attributes and in temporal dependencies via edge weights, optimising prediction accuracy and stability for balanced and unbalanced datasets. Extensive experiments show that GCNConv models excel on unbalanced data, while all models perform consistently on balanced data. Experiments also confirm the superior performance of HGCN(O) over traditional approaches. Applications include Predictive Business Process Monitoring (PBPM), which predicts future events or states of a business process based on event logs.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2507.22524

Country:

Europe (1.00)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(4 more...)

Add feedback

Mechanistic Decomposition of Sentence Representations

Tehenan, Matthieu, Natarajan, Vikram, Michala, Jonathan, Lin, Milton, Opitz, Juri

arXiv.org Artificial IntelligenceJun-11-2025

Sentence embeddings are central to modern NLP and AI systems, yet little is known about their internal structure. While we can compare these embeddings using measures such as cosine similarity, the contributing features are not human-interpretable, and the content of an embedding seems untraceable, as it is masked by complex neural transformations and a final pooling operation that combines individual token embeddings. To alleviate this issue, we propose a new method to mechanistically decompose sentence embeddings into interpretable components, by using dictionary learning on token-level representations. We analyze how pooling compresses these features into sentence representations, and assess the latent features that reside in a sentence embedding. This bridges token-level mechanistic interpretability with sentence-level analysis, making for more transparent and controllable representations. In our studies, we obtain several interesting insights into the inner workings of sentence embedding spaces, for instance, that many semantic and syntactic aspects are linearly encoded in the embeddings.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2506.04373

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

American Sign Language Video to Text Translation

Roy, Parsheeta, Han, Ji-Eun, Chouhan, Srishti, Thumu, Bhaavanaa

arXiv.org Artificial IntelligenceFeb-11-2024

Sign language to text is a crucial technology that can break down communication barriers for individuals with hearing difficulties. We replicate and try to improve on a recently published study. We evaluate models using BLEU and rBLEU metrics to ensure translation quality. During our ablation study, we found that the model's performance is significantly influenced by optimizers, activation functions, and label smoothing. Further research aims to refine visual feature capturing, enhance decoder utilization, and integrate pre-trained decoders for better translation outcomes. Our source code is available to facilitate replication of our results and encourage future research.

language translation, relu 0, translation, (13 more...)

arXiv.org Artificial Intelligence

2402.07255

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Africa > Middle East > Morocco (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

GPT-4 Generated Narratives of Life Events using a Structured Narrative Prompt: A Validation Study

Lynch, Christopher J., Jensen, Erik, Munro, Madison H., Zamponi, Virginia, Martinez, Joseph, O'Brien, Kevin, Feldhaus, Brandon, Smith, Katherine, Reinhold, Ann Marie, Gore, Ross

arXiv.org Artificial IntelligenceFeb-8-2024

Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.

classification, narrative, significance level, (15 more...)

arXiv.org Artificial Intelligence

2402.05435

Country:

North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > United States > Virginia > Suffolk (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Evaluating CNN with Oscillatory Activation Function

Sharma, Jeevanshi

arXiv.org Artificial IntelligenceNov-13-2022

The reason behind CNNs capability to learn high-dimensional complex features from the images is the non-linearity introduced by the activation function. Several advanced activation functions have been discovered to improve the training process of neural networks, as choosing an activation function is a crucial step in the modeling. Recent research has proposed using an oscillating activation function to solve classification problems inspired by the human brain cortex. This paper explores the performance of one of the CNN architecture ALexNet on MNIST and CIFAR10 datasets using oscillatory activation function (GCU) and some other commonly used activation functions like ReLu, PReLu, and Mish.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.06878

Country:

Asia > India > Uttar Pradesh > Aligarh (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Dimension-Free Average Treatment Effect Inference with Deep Neural Networks

Du, Xinze, Fan, Yingying, Lv, Jinchi, Sun, Tianshu, Vossler, Patrick

arXiv.org Machine LearningDec-2-2021

This paper investigates the estimation and inference of the average treatment effect (ATE) using deep neural networks (DNNs) in the potential outcomes framework. Under some regularity conditions, the observed response can be formulated as the response of a mean regression problem with both the confounding variables and the treatment indicator as the independent variables. Using such formulation, we investigate two methods for ATE estimation and inference based on the estimated mean regression function via DNN regression using a specific network architecture. We show that both DNN estimates of ATE are consistent with dimension-free consistency rates under some assumptions on the underlying true mean regression model. Our model assumptions accommodate the potentially complicated dependence structure of the observed response on the covariates, including latent factors and nonlinear interactions between the treatment indicator and confounding variables. We also establish the asymptotic normality of our estimators based on the idea of sample splitting, ensuring precise inference and uncertainty quantification. Simulation studies and real data application justify our theoretical findings and support our DNN estimation and inference methods.

estimate sigmoid 0, relu 0, sigmoid 0, (14 more...)

arXiv.org Machine Learning

2112.01574

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Neural Network Design for Energy-Autonomous AI Applications using Temporal Encoding

Mileiko, Sergey, Bunnam, Thanasin, Xia, Fei, Shafik, Rishad, Yakovlev, Alex, Das, Shidhartha

arXiv.org Artificial IntelligenceOct-15-2019

Neural Networks (NNs) are steering a new generation of artificial intelligence (AI) applications at the micro-edge. Examples include wireless sensors, wearables and cybernetic systems that collect data and process them to support real-world decisions and controls. For energy autonomy, these applications are typically powered by energy harvesters. As harvesters and other power sources which provide energy autonomy inevitably have power variations, the circuits need to robustly operate over a dynamic power envelope. In other words, the NN hardware needs to be able to function correctly under unpredictable and variable supply voltages. In this paper, we propose a novel NN design approach using the principle of pulse width modulation (PWM). PWM signals represent information with their duty cycle values which may be made independent of the voltages and frequencies of the carrier signals. We design a PWM-based perceptron which can serve as the fundamental building block for NNs, by using an entirely new method of realising arithmetic in the PWM domain. We analyse the proposed approach building from a 3x3 perceptron circuit to a complex multi-layer NN. Using handwritten character recognition as an exemplar of AI applications, we demonstrate the power elasticity, resilience and efficiency of the proposed NN design in the presence of functional and parametric variations including large voltage variations in the power supply.

duty cycle, perceptron, voltage, (16 more...)

arXiv.org Artificial Intelligence

1910.07492

Country:

Europe (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.82)

Industry:

Energy (0.87)
Information Technology (0.68)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.77)

Add feedback

Evaluation of Complex-Valued Neural Networks on Real-Valued Classification Tasks

Mönning, Nils, Manandhar, Suresh

arXiv.org Machine LearningNov-29-2018

Complex-valued neural networks are not a new concept, however, the use of real-valued models has often been favoured over complex-valued models due to difficulties in training and performance. When comparing real-valued versus complex-valued neural networks, existing literature often ignores the number of parameters, resulting in comparisons of neural networks with vastly different sizes. We find that when real and complex neural networks of similar capacity are compared, complex models perform equal to or slightly worse than real-valued models for a range of real-valued classification tasks. The use of complex numbers allows neural networks to handle noise on the complex plane. When classifying real-valued data with a complex-valued neural network, the imaginary parts of the weights follow their real parts. This behaviour is indicative for a task that does not require a complex-valued model. We further investigated this in a synthetic classification task. We can transfer many activation functions from the real to the complex domain using different strategies. The weight initialisation of complex neural networks, however, remains a significant problem.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

1811.12351

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback